perm filename TURING.TEX[W83,JMC] blob sn#695871 filedate 1983-01-13 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00004 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\magnify {1300}
C00009 00003		The tape space is not required to be either homogeneous or isotropic.
C00016 00004		At this point it becomes unclear how to proceed further.  What has
C00029 ENDMK
C⊗;
\magnify {1300}
\parskip 6pt plus 6pt minus 3pt
\vskip 20pt
\ctrline{THE INVERSION OF FUNCTIONS DEFINED BY TURING MACHINES}
\vskip 14pt	
\ctrline 		{John McCarthy}
\vskip 14pt	


	Consider the problem of designing a machine to solve well-defined
intellectual problems.  We call a problem well-defined if there is a test
which can be applied to a proposed solution.  In case the proposed solution
is a solution, the test must confirm this in a finite number of steps.  If
the proposed solution is not correct, we may either require that the test
indicate this in a finite number of steps or else allow it to go on indefinitely.
Since any test may be regarded as being performed by a Turing machine,
this means that well-defined intellectual problems may be regarded as those
of inverting functions and partial functions defined by Turing machines.
	
	Let $f↓m(n)$ be the partial function computed by the $m↑{\rm th}$ Turing
machine.  It is not defined for a given value of $n$ if the computation does
not come to an end.  This paper deals with the problem of designing a Turing
machine which, when confronted by the number pair $(m,r)$, computes as 
efficiently as possible a function $g(m,r)$ such that $f↓m(g(m,r)) = r$.  Again,
for particular values of $m$ and $r$ no $g(m,r)$ need exist.  In fact, it has
been shown that the existence of $g(m,r)$ is an undecidable question in
that there does not exist a Turing machine which will eventually come to a
stop and print a $1$ if $g(m,r)$ does not exist.

	In spite of this, it is easy to show that a Turing machine exists
which will compute a $g(m,r)$ if such exists.  Essentially, it substitutes
integers in $f↓m(n)$ until it comes to one such that $f↓m(n) = r$.  It will
therefore find $g(m,r)$ if it exists, but will never know enough to give
up if $g(m,r)$ does not exist.  Since the computation of $f↓m(n)$ may not 
terminate for some $n$, it is necessary to avoid getting stuck on such $n$'s.
Hence the machine calculates the numbers $f↓m↑k(n)$ in some order where $f↓m↑k(n)$
is $f↓m(n$) if the computation of $f↓m(n)$ ends after $k$ steps and is otherwise
undefined.

	Our problem does not end once we have found this procedure for
computing $g(m,r)$ because this procedure is extremely inefficient.  It
corresponds to looking for a proof of a conjecture by checking in some
order all possible English essays.

	In order to do better than such a purely enumerative method, it
is necessary to use methods which take into account some of the structure
of Turing machines.  But before discussing this, we must attempt to be
somewhat (only somewhat) more precise about what is meant by efficiency.

	The most obvious idea is to say that if $T↓1$ and $T↓2$ are two
Turing machines each computing a $g(m,r)$, then for a particular $m$ and
$r$ the more efficient one is the one which carries out the computation in 
the fewest possible steps.  However, this won't do since for any Turing machine
there is another one which does $k$ steps of the original machine in one step.
However, the new machine has many more different kinds of symbol than the old.
It is probably also possible to increase the speed by increasing the number
of internal states, though this is not so easy to show.  (Shannon shows
elsewhere in these studies that it is possible to reduce the number of 
internal states to two at the cost of increasing the number of symbols and
reducing the speed.)

	Hence we offer the following revised definition of the length of
a computation performed by a Turing machine.  For any universal Turing
machine there is a standard way of recoding on it the computation performed
by the given machine.  Let this computation be recoded on a fixed universal
Turing machine and count the number of steps in the new computation.  There
are certain difficulties here connected with the fact that the rate of 
computation is limited if the tape of the universal machine is finite 
dimensional, and hence the rate should probably be defined with respect to a 
machine whose tape is infinite dimensional but each square of which has at
most two states and which has only two internal states.  This requires a
mild generalization of the concept of Turing machines.
	The tape space is not required to be either homogeneous or isotropic.
We hope to make these considerations precise in a later paper.  For now, we
only remark that dimensionality is meant to be
$$\lim↓{n→∞}{\log {V↓n} \over \log n}$$
where $V↓n$ is the number of squares accessible to a given one in $n$ steps.
It will be made at least plausible that a machine with $Q$ internal states
and $S$ symbols should be considered as making about ${1\over 2} \log QS$ elementary 
steps per step of computation and hence the number of steps in a computation
should be multiplied by this factor to get the length of the computation.

	Having now an idea of what should be meant by the length $l(m,r,T)$
of a particular computation of $g(m,r)$ by the machine $T$, we can return
to the question of comparing two Turing machines.  Of course, if
$l(m,r,T↓1) ≤ l(m,r,T↓2)$ for all $m$ and $r$ for which these numbers are
finite, we should certainly say that $T↓1$ is more efficient than $T↓2$.  We
only consider machines that actually compute $g(m,r)$ whenever it exists.
Any machine can be modified into such a machine by adding to it facilities
for testing a conclusion and having it spend a small fraction of its time
trying the integers in order.  However, it is not so easy to give a function
which gives an over-all estimate of the efficiency of a machine at computing
$g(m,r)$.  The idea of assigning a weight function $p(m,r)$ and then calculating
$$\sum↓{m,r} p(m,r)\hskip 1.5pt l(m,r,T)$$
does not work very well because $l(m,r,T)$ is not bounded by any recursive
function of $m$ and $r$.  (Otherwise, a machine could be described for 
determining whether the computation of $f↓m(n)$ terminates.  It would simply carry
out some $k(m,n)$ steps and conclude that if the computation had not terminated
by this time it was not going to.)  There cannot be any machine which
is as fast as any other machine on any problem because there are rather
simple machines whose only procedure is to guess a constant which are fast
when $g(m,r)$ happens to equal that constant.

	We now return to the question of how to design machines which make
use of information concerning the structure of Turing machines.  The 
straightforward approach would be to try to develop a formalized theory of Turing
machines in which length of a computation is defined and then try to get a
decision procedure for this formal theory.  This is known to be a hopeless
task.  Systems much simpler than Turing machine theory have been shown to
have unsolvable decision procedures.  So, we look for a way of evading these
difficulties.

	Before discussing how this may be done, it is worthwhile to bring
up some more enumerative procedures.  First, let $f↓k(x,y)$ be the function
of two variables computed by the $k↑{\rm th}$ Turing machine. Our procedure consists
in trying the numbers $f↓k(m,r)$ in order (again diagonalizing to avoid
computations which don't end.)  This is based on the plausible idea that, in
searching for the solution to a problem, the given data should be taken into
account.

	The next complication which suggests itself is to revise the order
in which recursive functions are considered.  One way is to consider
$f↓{f↓k(l)}(m,r)$ diagonalized on $k$ and $l$.  This is based on the idea that
the best procedure is more likely to be recursively simple, rather than
merely to have a low number in the ordering.  More generally, the enumeration
of partial recursive functions should give an early place to compositions
of the functions which have already appeared.

	This process of elaborating the schemes by which numbers are tested
can be carried much further.  Intuitively it seems that each successive 
complication improves the performance on difficult problems, but imposes a delay
in getting started.

	The difficulty with the afore-mentioned methods and their elaborations
is that they have no concept of partial success.  It is a great advantage in
conducting a search to be able to know when one is close.  This suggests
first of all that a function $f↓k(x,y)$ be tried out first on simpler problems
known to have simple solutions before very many steps of the computation
$f↓k(m,r)$ are carried out.
	At this point it becomes unclear how to proceed further.  What has
been done is only a semiformal description of a few of the processes common
to scientific reasoning and we have no guarantee of being exhaustive.  Of
course, exhaustiveness in examining for usefulness can be attained for any
effectively enumerable class of objects simply by going through the enumeration
However, enumerative methods are always inefficient.  What is needed
is a general procedure which ensures that all relevant concepts which can be
computed with are examined and that the irrelevant are eliminated as rapidly
as possible.  Here we remark that a property is useful mainly when it permits
a new enumeration of the objects possessing it and not merely a test which
can be applied.  With the enumeration the objects not possessing the property
need not even be examined.  Of course, it is then necessary to be able to
express all other relevant properties as functions of the new enumeration.

	In order to get around the fact that all formal systems which are
anywhere near adequate for describing recursive function theory are incomplete,
we avoid restriction to any one of them by introducing the notion of a formal
theory (not for the first time, of course).

	For our purposes, a formal theory is a decidable predicate
$V(p,t↓1 \ldots , t↓k → t)$ which is to be regarded as meaning that $p$ is a
proof that the statements $t↓1,\ldots ,t↓k$ imply the statement $t$ in the theory.
We do not require that statements have negatives or disjunctions, although
theories with these and other properties of propositional calculi will 
presumably belong to the most useful theories.

	An interpretation of a formal theory is a computable function $i(t)$
mapping a class of statements of the theory into statements of other theories
or concrete propositions.  The only kind of concrete proposition which we
shall mention for now is $P(m,n,r,k)$ meaning $f↓m(n) = r$ in a computation
of length $≤ k$.  Such a proposition is of course verifiable.

	The theories and interpretations of them can be enumerated.  A
function of their G\"odel numbers is called a status function.  (To be regarded
as a current estimate of their validity and relevance, etc.)  An action
scheme is a computation rule which computes from a status function (its
G\"odel number) a new status function, perhaps gives an estimate of $g(m,r)$ if
it has determined this, and computes a new action scheme.

	REMARK 1.  A status function would consist primarily of estimates
of the validity of the separate theories and would place theories which had
not been examined in an ``unknown'' category.  However, an action scheme might
modify the status of a whole class of theories simultaneously in some 
systematic way.

	REMARK 2.  The reader should note that a formal equivalence could 
probably be established between formal theories and certain action schemes.
Thus, knowledge can be interpreted as a predisposition to act in certain
ways under given conditions, and an action scheme can be interpreted as a
belief that certain action is appropriate under these conditions.  This
equivalence should not be made because a simple theory may have a very 
complicated interpretation as an action scheme and conversely, and in this
inversion problem the simplicity of an object is one of its most important
properties.
	This suggests an Alexandrian solution to a knotty question.  Perhaps,
a machine should be regarded as thinking if and only if its behavior can
most concisely be described by statements, including those of the form
``It now believes Pythagoras' theorem.''  This would not be the case for a
phonograph record of a proof of Pythagoras' theorem.  Such statements will
probably be especially applicable to predictions of the future behavior
under a variety of stimuli.

	REMARK 3.  In determining a procedure which has action schemes,
theories, etc., we are imposing our ideas of the structure of the problem
on the machine.  The point is to do this so that the procedures we offer are
general (will eventually solve every solvable problem) and also are improvable
by the methods built into the machine.

	REMARK 4.  Except for the concrete proposition none of the statements
of a formal theory have necessary interpretations which are stateable
in advance.  However, if the machine were working well, an observer of its
internal mechanism might come to the conclusion that a certain statement of
a theory is being used as though it were, say  $∀x. f↓m(x) = 0$, etc.  An
important merit of a particular theory might be that in it the verification
of a proof at the correlate of a certain concrete proposition is much shorter
than the computation of the concrete proposition, i.e., shorter than $k$ in
$P(m,n,r,k)$.

	REMARK 5.  The relation of certain action schemes to certain
theories might warrant our regarding certain statements in them as being
normative, i.e, of the form ``the next axiom scheme should be no. $m$'' or
``theory $k$ should be dropped from consideration.''

	REMARK 6.  Not every worthwhile problem is well-defined in the 
sense of this paper.  In particular, if there exist more or less satisfactory
answers with no way of deciding whether an answer already obtained can be
improved on a reasonable time, the problem is not well-defined.

\vskip 12pt
{\bf 1982 Commentary:}

	This paper is reprinted from Automata Studies edited by C. E. 
Shannon and J. McCarthy, Princeton University Press, 1956.

The paper  was written in the summer of 1952 and elaborated
somewhat before it was published in 1956.   It was written to clarify
the meaning of ``problem'' and to discuss problem solving methods that
work directly from the definition of problem.  A problem was
called well-defined if a procedure was given for testing a
proposed solution, and the problem solving methods worked backwards
from this test.
Thus no auxiliary facts of general common sense knowledge or of
the specific domain are used.

	Even at the time the idea didn't seem promising.  I cannot
reconstruct my state of mind at the time, but I spent most of my
time thinking about methods that involved knowledge of facts, and
the ideas that were first published with my 1958 paper ``Programs
with Common Sense''.  However, I was unable to make these ideas sufficiently
definite in the summer of 1952, so I decided to explore this more
limited domain.

	There is nothing about computers
in the ``Inversion ... '' paper.  This is because I, in common at least
with Minsky and Shannon with whom I discussed the ideas, thought about
artificial intelligence in terms of machines, not programmable computers,
about which we knew little.  Even von Neumann, who did know about
programmable computers, seemed to think about thinking machines
exclusively as objects analogous to the human brain.  To my knowledge,
only Turing then considered programming computers as the technique
for realizing artificial intelligence.  Newell and Simon were next
in 1954, and I switched my attention to computers in 1955, during a
summer spent working for IBM in Poughkeepsie, New York.

	Perhaps some part of the idea could be rehabilitated, but
it would seem appropriate to use Lisp functions or logic programs
as the way to represent the function to be inverted.  The paper
does not discuss different representations of the same function
nor does it discuss the natural restriction (as long as one hand
is to be tied behind the back) of using the function to be inverted
only extensionally, i.e. by computing values, although some of the
methods observe this restriction.
\vfill
\end